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Abstract.  We  present  a  two-part  approach  for  verifying  out-of-order  execution. 
First,  the  complexity  of  out-of-order  issue  and  scheduling  is  handled  by  creating 
an  in-order  abstraction  of  the  out-of-order  execution  core.  Second,  incremental 
flushing  addresses  the  complexity  difficulties  encountered  by  automated  abstrac¬ 
tion  functions  on  very  deep  pipelines.  We  illustrate  the  techniques  on  a  model  of 
a  simple  out-of-order  processor  core. 


1  Introduction 

Formal  verification  of  microprocessor  designs  using  theorem  proving  aims  at  proving 
that  a  processor  model  behaves  as  defined  by  an  instruction-set  architecture  (ISA).  The 
ISA  captures  the  programmer-level  view  of  the  machine.  This  approach  requires  an  ab¬ 
straction  function  that  relates  the  state  of  the  processor  model  with  the  corresponding 
state  of  the  ISA.  Finding  this  abstraction  function  manually  for  pipelined  designs  is 
tedious  and  time  consuming.  In  response,  Burch  and  Dill  devised  an  approach  that  au¬ 
tomatically  generates  the  abstraction  function  by  flushing  the  implementation  state  [3]. 
The  technique  has  been  extended  to  dual-issue  and  super-scalar  architectures  [7, 2, 15]. 

While  formal  verification  techniques  exist  for  pipelined  and  super-scalar  architec¬ 
tures,  experience  verifying  out-of-order  architectures  is  minimal.  The  distinct  features 
of  out-of-order  architectures  challenge  existing  verification  approaches.  First,  the  ex¬ 
tended  instruction  parallelism  in  out-of-order  architectures  results  in  many  complex 
interactions  between  executing  instructions.  This  greater  complexity  makes  it  very  dif¬ 
ficult  to  devise  an  abstraction  function.  Second,  large  (>  40  element)  buffers  are  used 
to  record  and  maintain  the  program  order  of  instructions.  Burch  and  Dill’s  automated 
pipeline  flushing  approach  does  not  work  for  out-of-order  architectures  in  practice  be¬ 
cause  the  number  of  cycles  required  to  empty  the  buffer  completely  is  so  large.  The 
logical  formulas  are  too  complex  to  manipulate  in  proofs  and  often  too  complex  even 
to  construct. 

We  present  a  two-part  approach  that  deals  with  the  out-of-order  scheduling  logic  and 
the  in-order  buffering  mechanisms  separately.  First,  the  implementation  is  modified  to 
derive  an  in-order  abstraction.  These  modifications  bypass  the  out-of-order  logic  and 
result  in  instructions  executing  in  order.  By  exploiting  domain-specific  knowledge,  we 
are  able  to  establish  a  functional  equivalence  relation  between  the  out-of-order  imple¬ 
mentation  and  the  abstraction.  The  second  step  of  our  technique  shows  that  the  in-order 
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abstraction  is  functionally  equivalent  to  the  ISA.  This  is  accomplished  via  a  technique 
introduced  in  this  paper  that  we  call  incremental  flushing,  based  on  the  Burch-Dill  au¬ 
tomatic  flushing  approach  and  the  self-consistency  technique  of  Jones  et  al  [8].  In¬ 
cremental  flushing  reduces  the  verification  complexity  associated  with  flushing  lengthy 
pipelines.  This  technique  is  also  applicable  to  verification  of  other  deeply-pipelined 
hardware  designs,  not  just  out-of-order  microarchitectures. 

We  have  created  a  simple  model  of  an  out-of-order  execution  core  that  we  use  to  il¬ 
lustrate  our  approach.  Although  our  example  is  not  representative  of  industrial-scale  der 
signs,  it  captures  essential  features  of  out-of-order  architectures:  large  queuing  buffers, 
resource  allocation  within  the  buffers,  and  data-path  scheduling  of  execution  resources. 
However,  using  the  techniques  presented  here,  we  were  able  to  verify  it  using  the  Stan¬ 
ford  Validity  Checker  (SVC)  [1].  In  particular,  we  have  verified  its  correctness  for  any 
(reasonable)  scheduling  algorithm. 

2  Related  Work 

Sawada  and  Hunt’s  theorem-proving  approach  uses  a  table  of  history  variables,  called  a 
micro-architectural  execution  trace  table  (MAETT)  [14, 13].  The  MAETT  is  an  inter¬ 
mediate  abstraction  that  contains  selected  parts  of  the  implementation  as  well  as  extra 
history  variables  and  variables  holding  abstracted  values.  It  includes  the  ISA  state  and 
the  ISA  transition  function.  A  predicate  relating  the  implementation  and  MAETT  is 
found  by  manual  inspection  and  proven  by  induction  to  be  an  invariant  on  the  execution 
of  the  implementation.  In  our  approach,  the  intermediate  abstraction  does  not  include 
the  ISA  state,  but  is  closer  to  the  implementation  in  abstraction  level.  This  minimizes  the 
manual  work  needed  to  find  the  relation  between  the  implementation  and  abstraction. 
We  then  use  an  incremental  flushing  technique  to  automatically  generate  the  abstrac¬ 
tion  function,  significantly  reducing  the  manual  work  required  to  relate  the  intermediate 
abstraction  to  the  ISA. 

Damm  and  Pnueli  generalize  an  ISA  specification  to  a  non-deterministic  abstrac¬ 
tion  [4].  It  is  then  verified  that  the  implementation  satisfies  the  abstraction  by  manually 
establishing  and  proving  the  appropriate  invariants.  They  have  applied  their  technique  to 
the  Tomasulo  algorithm  [5],  which  has  out-of-order  instruction  completion.  In  contrast, 
our  out-of-order  model  features  in-order  retirement.  In  our  approach,  the  intermediate 
abstraction  executes  instructions  in-order.  Damm  and  Pnueli’s  abstraction  represents  all 
possible  instruction  sequences  which  observe  dataflow  dependencies.  Applying  their 
method  to  architectures  with  in-order  retirement  would  require  manual  proof  by  in¬ 
duction  that  the  intermediate  abstraction  satisfies  the  ISA.  We  automate  this  proof  by 
incremental  flushing. 

Henzinger  et  al.  use  Tomasulo ’s  algorithm  to  illustrate  a  method  for  manually  de¬ 
composing  the  proof  obligation  [6].  They  provide  abstract  modules  for  parts  of  the 
implementation.  These  modules  cofrespond  to  implementation  internal  transactions. 
Similar  to  our  approach,  the  abstractions  are  invariants  on  the  implementation  and  are 
extended  with  auxiliary  variables.  Again,  our  approach  automates  part  of  the  abstraction 
process. 

McMillan  model  checks  the  Tomasulo  algorithm  by  manually  decomposing  the 
proof  into  smaller  correctness  proofs  of  the  internal  transactions  that  together  form 


one  step  of  execution  [11].  Furthermore,  he  uses  symmetry  reduction  technique  to  ex¬ 
tend  the  proof  to  a  large  number  of  execution  units.  Our  proofs  are  also  decomposed 
into  properties  of  internal  transactions.  In  contrast  to  an  automated  model  checking  ap¬ 
proach,  our  theorem-proving  based  method  is  able  to  handle  internal  buffers  of  arbitrary 
size. 

Incremental  flushing  is  related  to  the  distributed  systems  work  of  Katz  [10].  His 
formalization  deals  with  atomic,  concurrent  transactions  which  can  be  reordered  into 
a  more  convenient  form  for  formal  analysis — without  affecting  the  soundness  of  the. 
final  result.  However,  the  framework  of  distributed  transactions  cannot  be  directly  ap¬ 
plied  to  verification  microprocessor  architectures  where  the  control  logic  dictates  the 
sequencing  of  internal  transactions. 

3  Preliminaries 

The  desired  behavior  of  a  processor  is  defined  by  an  instruction-set  architecture  (ISA). 
The  ISA  represents  the  programmer-level  view  of  the  machine  where  instructions  ex¬ 
ecute  sequentially.  The  ISA  for  our  example  is  shown  in  Figure  la.  The  simple  state 


(a)  (b) 

Fig.  1.  (a)  The  simple  ISA  model,  (b)  Instmction  flow  in  our  out-of-order  execution  core  IMPL. 


consists  of  a  register  file  (RF),  while  the  next-state  function  is  computed  with  an  exe¬ 
cution  unit  (EU)  that  can  execute  any  instruction.  The  ISA  also  accepts  a  bubble  input 
that  leaves  the  state  unchanged.  Note  that  our  ISA  model  does  not  include  a  program 
counter  or  memory  state — as  these  are  also  omitted  from  our  simplified  out-of-order 
model. 

Modern  processors  do  not  implement  the  ISA  in  this  manner,  because  the  perfor¬ 
mance  would  be  abysmal.  In  out-of-order  architectures,  instructions  are  fetched,  de¬ 
coded,  and  sent  to  the  execution  core  in  program  order.  Internally,  however,  the  core 
executes  instructions  out-of-order,  as  allowed  by  data  dependencies.  This  allows  inde¬ 
pendent  instructions  to  execute  concurrently.  Finally,  instruction  results  are  written  back 
to  architecturally- visible  state  (the  register  file)  in  the  order  they  were  issued. 

Consider  our  example  out-of-order  execution  core  (IMPL)  shown  in  Figure  lb.  The 
architectural  register  file  (RF)  contains  the  current  state  of  the  ISA-defined  architectural 


registers.  When  an  instruction  is  issued,  new  entries  are  allocated  in  both  the  dispatch 
and  retirement  buffers,  and  the  register  translation  table  (RTT)  entry  for  the  logical 
register  corresponding  to  the  instruction  destination  is  updated.  The  RTT  is  used  to 
locate  the  instruction’s  source  data.  Instructions  are  dispatched,  possibly  out-of-order, 
from  the  dispatch  buffer  (DB)  to  individual  execution  units  when  their  operands  are 
ready  and  an  execution  unit  is  available.  When  an  instruction  finishes  execution,  the 
result  is  written  back  to  the  retirement  buffer  (RB).  This  data  is  also  bypassed  into  the 
DB  for  instructions  awaiting  that  particular  result.  Finally,  the  RB  logic  must  ensure  that 
instruction  results  are  retired  (committed  to  architectural  state)  in  the  original  program 
order.  When  an  RB  entry  is  retired,  the  RTT  is  informed  so  that  the  logical  register  entry 
corresponding  to  the  instruction’s  destination  can  be  updated  if  necessary.  IMPL  also 
accepts  a  special  bubble  flushing  input  in  place  of  an  instruction.  Intuitively,  a  bubble 
is  similar  to  a  NOP  instruction  but  does  not  affect  any  state  or  consume  any  resources 
after  being  issued. 

We  have  made  significant  simplifying  assumptions  in  our  processor  model:  instruc¬ 
tions  have  only  one  source  operand,  and  only  one  issue  and  one  retire  can  occur  each 
cycle.  Our  model  is  out-of-order  because  the  execution  units  have  variable  latency.  We 
also  omit  a  “front-end”  with  fetch,  decode,  and  branch  prediction  logic.  Omitting  these 
features  allowed  our  efforts  to  focus  on  the  features  which  make  the  out-of-order  ver¬ 
ification  problem  difficult:  the  out-of-order  execution  and  the  large  effective  depth  of 
the  pipeline.  The  SVC  verification  reported  in  this  paper  used  a  model  with  unbounded 
buffers. 

4  The  Approach 

The  goal  of  our  verification  approach  is  to  prove  that  the  out-of-order  implementation 
IMPL  (as  described  by  an  HDL  model)  satisfies  the  ISA  model.  We  define  Si  to  be  the 
implementation  next-state  function,  which  takes  a  state  qi  and  an  input  instruction  i  and 
returns  a  new  state  q[,  i.e.,  q[  =  Si{qi,i).  We  extend  6i  in  the  obvious  way  to  operate 
over  input  sequences  w  =  io  . .  .in-  We  define  Sg  similarly  for  ISA. 

Let  (7  be  a  size  function  that  returns  the  number  of  currently  executing  instructions, 
i.e.,  those  that  have  been  issued  but  not  retired.  We  require  that  a'(g°)  =  0  for  an 
initial  implementation  state  g®.  We  define  an  instruction  sequence  w  to  be  completed  iff 
cr{Si{q^,w))  =  0,  i.e.,  all  instructions  have  been  retired  after  executing  w.  We  use  the 
projection  function  7rRF(gt)  to  denote  the  register  file  contents  in  state  qi.  For  clarity  in 
presentation,  we  define  qn  ^  qi2  to  be  7rRF(gii)  =  7rRp(gi2)»  and  we  will  sometimes  use 
=  when  the  projection  ttrf  is  redundant  on  one  side  of  the  equality. 

The  overall  correctness  property  for  IMPL  with  respect  to  ISA  is  expressed  as: 
Correctness  For  every  completed  instruction  sequence  w  and  initial  state 

8i{q°i,w)  =  t55(7rRp(g?),u;). 

That  is,  the  architecturally  visible  state  in  IMPL  and  ISA  is  identical  after  executing 
any  instruction  sequence  that  retires  all  outstanding  instructions  in  the  implementation. 
This  is  the  same  commuting  property  used  by  several  approaches,  including  [3],  Note 
that  because  our  model  is  only  an  execution  core,  we  are  only  checking  the  correctness 


of  the  register  file.  A  (future)  verification  of  a  more  complete  processor  model  could 
check  the  program  counter  and  memory. 

We  verify  the  correctness  property  by  dealing  with  the  out-of-order  and  in-order 
parts  of  IMPL  separately.  First,  we  derive  an  in-order  intermediate  abstraction  (ABS) 
from  IMPL.  We  then  establish  an  equivalence  relation  between  ABS  and  IMPL.  In  the 
second  step,  we  demonstrate  functional  equivalence  between  ABS  and  ISA.  By  transi¬ 
tivity  of  equality  of  the  final  register  file  values,  this  establishes  functional  equivalences 
between  IMPL  and  ISA. 

5  First  Step:  Functional  Equivalence  of  IMPL  and  ABS 

ABS  is  derived  directly  from  EMPL  by  removing  the  “out-of-orderness”  while  preserv¬ 
ing  the  in-order  buffering  mechanism  (Figure  2).  In  ABS,  the  DB  has  been  removed: 
instructions  are  executed  immediately  upon  issue.  However,  the  results  are  queued  and 
not  written  to  architectural  state  until  later.  In  the  ABS  model  for  this  paper,  instructions 
are  issued,  executed,  and  written  into  an  annotated  RB  in  one  clock.  The  write-only  an¬ 
notated  state  in  the  RB  contains  some  of  the  information  lost  with  the  DB  removal  and 
aids  in  finding  invariants.  ABS  accepts  the  same  bubble  input  as  IMPL.  We  add  an 
extra  input  to  ABS  called  the  retirement  flag  that  signals  when  to  retire  the  oldest  in¬ 
struction.  ABS  thus  has  more  possible  behaviors  than  IMPL:  while  instruction  results 
are  computed  immediately  in  ABS,  they  may  be  buffered  indefinitely  in  the  annotated 
RB  before  being  committed  to  architectural  state. 


Fig.  2.  Instruction  flow  in  the  intermediate  abstraction. 

We  must  prove  that  IMPL  is  a  refinement  of  ABS.  We  define  Sa  to  be  the  ABS 
next-state  function,  which  takes  an  initial  state  Qa  and  a  pair  consisting  of  an  input 
instruction  i  and  a  Boolean-valued  retirement  input  r,  and  returns  a  new  state  q^,  i.e., 
=  Sa  {Qa ,  r) ) .  The  retirement  input  r  indicates  in  each  step  of  execution  whether  or 

not  to  retire  a  result.  A  retirement  input  r  is  allowed  by  a  state  Qa  and  input  i  iff  r  never 
tells  ABS  to  retire  an  instruction  when  one  is  not  waiting.  Note  that  it  is  allowable  for  r 
to  retire  a  waiting  instruction.  We  extend  the  definition  of  Sa  to  sequences  of  instruc¬ 
tion  inputs  w  and  retirement  inputs  Wr  =  Vq  . .  .rn  such  that  Qa  =  ^a(^a>  (w;?  ^r))^  • 

We  define  states  qi  of  IMPL  and  qa  of  ABS  to  be  consistent  when  qi^  qa-  We  must 
demonstrate  that: 

^  The  pair  of  sequences  (w^Wr)  is  easily  derived  from  the  corresponding  sequence  of  pairs 


Impl-ABS  Refinement  For  every  completed  instruction  sequence  w  and  every  pair  of 
consistent  initial  states  qi,  Qa*  there  exists  a  sequence  of  retirement  inputs  Wr  allowed 
by  and  w  such  that 

=5aiq2,{w,Wr)). 


We  prove  that  IMPL  is  a  refinement  of  ABS  by  induction:  we  show  that  for  each 
step  that  IMPL  makes,  there  exists  an  ABS  step  such  that  the  register  files  are  identical. 
Forcing  ABS  to  retire  instructions  in  lock  step  with  IMPL  is  straightforward.  ABS  re¬ 
tirement  inputs  are  generated  from  an  oracle  which  observes  whether  or  not  the  IMPL 
is  retiring  an  instruction  and  instructs  ABS  to  do  the  same  thing.  We  establish  qi  =  qa 
by  proving  a  stronger  property.  We  derive  a  relation  Tl  between  IMPL  and  ABS  states 
such  that:  l^iqi^qa)  ^  {qi  —  9a)-  We  demonstrate  that  7^  is  a  simulation  relation  [9]: 

Proof  Obligation  1  (IMPL-ABS  Equivalence) 

7.  (Base  Case)  For  every  initial  implementation  state  9°,  there  exists  an  initial  ABS 
state  9°,  such  that: 

2.  (Induction  Step)  For  every  instruction  i,  for  every  pair  of  consistent  initial  states 
qi,ql»  and  for  every  instruction  sequence  w  and  retirement  sequence  Wr  with  re¬ 
sulting  states  qi  =  5i{q°,w),  qa  =  ^a(9o5  there  exists  a  retirement  input 

r  such  that 

^(9*5 9a)  '^{Si{qi,i),Sa{qaAhr)))- 

Deriving  IZ  is  non-trivial.  One  way  to  construct  7Z  is  to  mechanically  derive  the  weakest 
invariant  which  implies  qi  ^  q^.  Of  course,  this  technique  blows  up  when  applied 
directly  to  a  complex  circuit. 

The  relation  IZ  is  formed  as  a  conjunction  of  the  IMPL  reachability  invariant,  the 
ABS  reachability  invariant,  and  assertions  relating  the  IMPL  state  with  the  ABS  state. 
The  difficulties  associated  with  deriving  invariants  are  ubiquitous.  We  used  an  ad  hoc 
collection  of  domain-specific  techniques  we  found  to  be  quite  effective.  The  process 
of  deriving  and  proving  the  reachable-state  invariant  for  IMPL  was  simplified  by  rec¬ 
ognizing  that  the  out-of-order  mechanism  in  a  given  cycle  consists  of  a  number  of 
transactions — each  of  which  operate  on  only  part  of  IMPL  state.  In  IMPL,  these  are 
issue,  dispatch,  writeback,  and  retire.  The  ABS  reachability  invariant  is  easily  derived 
from  the  IMPL  reachability  invariant,  because  ABS  is  essentially  a  simple  IMPL.  Some 
IMPL  state  is  not  present  in  ABS,  and  other  IMPL  state  has  been  renamed  and  is  now 
part  of  the  annotated  RB. 

We  added  link  assertions  which  relate  partially  executed  instructions  in  the  DB  and 
RB  of  IMPL  to  their  counterparts  in  the  annotated  RB  of  ABS.  The  link  assertions 
ensure  that  the  partially  executed  instructions  in  the  implementation  always  have  the 
correct  value  or  the  information  needed  (pointers  or  data)  to  eventually  compute  the 
correct  value.  Run  times  and  memory  usage  for  proving  the  proof  obligations  on  our 
example  are  reported  in  Section  7. 


issue,  no  retire  (a’ =  <y+l)  • — no  issue,  retire  (or’ =  a-l)  issue,  retire  (a’ =  a)  no  issue,  no  retire  (a’ =  a) 


Fig.  3.  (a)  A  Afax-n  execution  £n.  (b)  An  equivalent  non-diagonal  execution  (c)  An  equivalent 
Max-1  execution  £1.  Labels  in  and  rn  denote  the  issue  and  retirement  of  instruction  number  n. 
The  label  rn\\in  denotes  simultaneous  issue  and  retire,  r  :  n  is  a  shorthand  for  n  cycles  where 
in  each  cycle,  bubbles  are  issued  and  nothing  is  retired.  The  numbers  indicate  the  sizes  of  each 
state.  The  squares  indicate  the  distance  between  Sh  and  ei. 


6  Second  Step:  Functional  Equivalence  of  ABS  and  ISA 

In  this  section,  we  introduce  incremental  flushing,  and  use  it  to  prove  that  ABS  satisfies 
ISA.  Formally,  we  desire  to  establish  that: 

ABS-ISA  Equivalence  For  every  completed  instruction  sequence  w,  initial  ABS  state 
q^y  and  sequence  of  retirement  inputs  Wr  allowed  by  w  and  q^: 

SaiQaA'^^'^r))  =  <5s(7rRF(ga)> 

ABS  contains  an  annotated  RB  that  queues  instruction  results  before  they  are  com¬ 
mitted  to  architectural  state.  Recall  that  the  Burch-Dill  abstraction  function  flushes  an 
implementation  (by  inserting  bubbles)  for  the  number  of  clock  cycles  necessary  to 
completely  expose  the  internal  state.  In  the  case  of  a  simple  five-stage  pipeline,  only 
five  steps  are  required  to  complete  the  partially  executed  instructions.  Following  this 
approach  with  our  model  would  compare  a  potentially  full  annotated  RB  with  the  ISA 
model.  The  Burch-Dill  flushing  technique  would  unroll  ABS  to  the  depth  of  the  an¬ 
notated  RB,  resulting  in  a  logical  expression  too  large  for  the  decision  procedure  to 
check. 

Our  incremental  flushing  approach  overcomes  this  unmanageable  complexity.  In¬ 
stead  of  flushing  the  entire  pipeline  directly,  a  set  of  smaller,  inductive  flushing  steps 
is  performed.  Taken  together,  these  proof  obligations  imply  the  monolithic  flushing  op¬ 
eration.  To  illustrate,  consider  the  graphical  presentation  of  three  different  executions 
of  ABS  in  Figure  3.  We  define  the  execution  of  a  system  as  the  sequence  of  states  that 
the  system  passes  through  when  executing  a  given  pair  of  input  sequences  {w^Wr).  For 
instance,  the  execution  shown  in  Figure  3a  is  a  result  of  executing  the  input  sequence: 

(h ,  F),  {i2 ,  F) ,  (bubble,  F)  ,  (^3 ,  F)  ,  (bubble,  T) ,  (^4 ,  F)  , 

(is,  T),  (bubble,  T),  (bubble,  F),  (ie,  T),  (bubble,  T),  (bubble,  T) 


Apart  from  self-loops,  edges  are  only  traversed  when  instructions  are  issued  or  retired. 

We  use  eiqa,  to  denote  the  execution  (sequence  of  states)  resulting  from 

the  application  of  6a  to  qa  and  {w^  Wr)^  We  define  last{e{qa^  (it?,  '^r)))  as  the  last  state 
of  the  execution.  Note  that  by  definition: 

last{e{qa,{'^^'^r)))  =  SaiqaA'^^'^r)))- 

Each  state  in  an  execution  is  associated  with  the  number  of  active  instructions — defined 
earlier  as  the  size  function  cr.  This  is  illustrated  in  Figure  3c.  We  call  an  execution  where 
for  all  states  cr  <  n  a  Mox-n  execution  (denoted  6^)*  Accordingly,  completely  serialized 
executions  with  at  most  one  outstanding  element  are  Max-l  executions  (denoted  ei). 

Our  verification  of  ABS-ISA  equivalence  proceeds  in  two  steps.  First,  we  establish 
that: 

Incremental  Flushing  For  every  initial  state  q°  and  Max-n  execution  Sn(qai  ^r) 
there  exists  {w^ ,  wl)  (derived  from  w,Wr  by  reordering  issues  and  retires)  and  a  corre¬ 
sponding  Max-1  execution  £i  {qa,  ))  that: 

last{en{q°a,  {w,Wr)))  =  last{eiiq°,  {w^,'wl))). 


A  Max-1  execution  is  derived  from  a  Max-n  execution  by  reordering  the  issues  and 
retires.  This  notion  is  based  on  the  concept  of  self-consistency:  execution  results  should 
be  equivalent  for  certain  classes  of  inputs  [8].  The  final  results  of  Max-n  and  Max-1 
executions  will  be  identical  if  we  can  prove  inductively  that  reordering  issue  and  retires 
for  distinct  instructions  does  not  change  the  resulting  state.  Section  6.1  details  the  proof 
obligations  for  this  step. 

The  second  ABS-ISA  verification  step  shows  that  all  Max-l  executions  produce  the 
same  result  as  the  ISA  model. 

Max-1  ABS-ISA  Equivalence  For  every  initial  state  q°,  and for  every  Max-1  execution 
£i  corresponding  to  an  instruction  sequence  'ifr  and  allowed  retirement  sequence  w].: 

last{ei{ql,  {w^,wl)))  =  6s{-7refiql),w). 

Proving  this  is  much  simpler  than  the  original  problem  of  directly  proving  ABS-ISA 
equivalence,  since  only  one  instruction  is  present  in  ABS  at  a  time.  The  proof  is  carried 
out  by  induction  on  the  length  of  instruction  sequences,  as  described  in  Section  6.2. 

6.1  Incremental  Flushing 

Space  limitations  prevent  us  from  presenting  the  complete  proofs  justifying  the  incre¬ 
mental  flushing  approach.  We  will,  however,  state  the  verification  steps  and  resulting 
proof  obligations.  We  also  include  a  proof  sketch  for  the  inductive  step  of  incremental 
flushing. 

The  incremental  flushing  proof  step  can  be  split  up  into  three  proof  obligations, 
as  illustrated  in  Figure  4a-c.  Recall  that  Sa  takes  a  state,  an  input,  and  a  retirement 
input  flag.  We  use  T  and  F  for  the  values  of  the  retirement  input  flag,  where  T  forces 
ABS  to  retire  an  instruction,  and  F  prevents  it  from  doing  so.  The  first  proof  obligation 
demonstrates  the  independence  of  inserting  and  removing  elements  from  the  system: 


Fig.  4.  (a)  Proof  Obligation  2,  the  nodes  are  labeled  with  their  sizes,  (b)  Proof  Obligation  3.  (c) 
Proof  Obligation  4.  (d)  Proof  Obligation  5,  the  ISA  induction  step. 


Proof  Obligation  2  (Reordering  Step)  For  every  reachable  state  qa  s.t.  (T{qa)  > 
for  every  input  i: 

^ai^ai^a,  (*,?)),  (bubble,  T))  =  Sa(Sa(qa,  (bubble,  T)),  (i,F}). 

In  other  words,  we  must  show  that  the  relative  order  of  retirement  and  issue  is  immate¬ 
rial  for  distinct  instructions.  The  next  proof  obligation  requires  that  simultaneous  issue 
and  retirement  of  distinct  instructions  yields  the  same  result  as  a  sequential  retirement 
and  issue: 

Proof  Obligation  3  (Parallel  Correctness)  For  every  reachable  state  qa  sX  (j{qa)  >  I, 
and  for  every  input  i: 

l5a(ga,  (*,T))  =  5a(<5a(ga,  (bubble,  T)),(i,F)). 

The  final  proof  obligation  illustrates  that  bubble  inputs  without  retirement  do  not 
change  ABS  state: 

Proof  Obligation  4  (Correctness  of  Self-Loops)  For  every  reachable  state  qa: 

SaiQa,  (bubble,  F))  =  Qa- 

Taken  together,  these  three  proof  obligations  establish  the  Incremental  Flushing  step 
of  our  verification,  i.e.,  that  every  Max-n  execution  has  a  functionally  equivalent  Mojc-l 
execution.  We  next  give  a  brief  sketch  of  the  proof. 

Proof  Sketch:  We  assume  the  three  Proof  Obligations  shown  above  and  must  show 
that  for  every  Max-n  execution  Sn  there  exists  a  corresponding  Max-1  execution  £\ 
such  that 

last{en{ql,{'^,'>^r)))  =  last(ei{q°,  (u;^w^))). 

We  perform  the  proof  in  two  steps,  as  illustrated  in  Figure  3.  Given  an  execution 
(Figure  3a)  we  first  show  that  we  can  construct  a  “non-diagonal”  execution  (Fig¬ 
ure  3b)  from  £n  that  does  not  have  any  diagonals  nor  self-loops,  and  such  that 

last{e„{ql,  (w,Wr)))  =  last(£A(q°,  (w,Wr))). 

This  is  proved  by  induction  on  the  length  of  €n-  We  use  Proof  Obligation  3  to  replace 
any  diagonal  edge  with  horizontal  and  vertical  edges.  Proof  Obligation  4  is  used  to 
remove  the  self-loops. 


The  second  step  shows  that  we  can  derive  a  Max-1  sequence  ei  (Figure  3c)  such 

that 

last{eniql,{'w,Wr)))  =  last{ei{ql, 

We  prove  this  by  induction  on  the  distance  between  the  non-diagonal  Max-n  execution 
Eft  and  the  Max-1  execution  e\ ,  where  distance  is  the  number  of  “squares”  that  separate 
the  two  executions.  For  example,  eight  squares  separate  the  executions  in  Figures  3b 
and  3c.  We  repeatedly  apply  Proof  Obligation  2,  “folding”  the  Max-n  execution  Sfi, 
back  to  the  corresponding  Max-1  execution  ei.  This  is  possible  because  the  input  se¬ 
quences  resulting  in  €n  and  Sn  are  completed  (defined  in  Section  4).  Each  folding  is’  a 
reordering  of  independent  retires  and  issues. 

End  Proof  Sketch. 

Note  that  each  folding  is  a  rewrite  of  the  execution.  It  is  easy  to  see  that  Proof 
Obligations  2-4  together  are  a  confluent  (Church-Rosser)  set  of  rewrite  rules,  where 
the  Max-1  execution  is  the  unique  normal  form. 

6.2  Afojc-l  ABS-ISA  Equivalence 

The  final  verification  step  is  to  show  that  all  Max-1  executions  of  ABS  are  functionally 
equivalent  with  ISA.  We  can  divide  the  Max-1  execution  up  into  issue-retire  fragments 
that  are  simple  “steps”  in  the  graphical  illustration.  The  proof  is  a  simple  induction  on 
the  number  of  these  fragments,  comparing  the  execution  and  retirement  of  an  arbitrary 
instruction  from  an  arbitrary  ABS  Max-1  state  with  the  result  that  is  retired  by  ISA, 
This  is  illustrated  in  Figure  4d.  Formally: 

Proof  Obligation  5  (ABS-ISA  Induction)  For  every  initial  lA  state  and  every  in¬ 
struction  i: 

Sa{Saiql,{h^)):  (bubble,  T))  =  Ss{n^iql),i). 

Because  we  have  previously  shown  that  a  functionally  equivalent  Max-1  execution  can 
be  derived  from  an  arbitrary  Max-n  execution,  this  step  completes  the  proof  of  ABS- 
ISA  equivalence. 

7  Results 

We  have  mechanically  checked  Proof  Obligations  1  -5  for  our  models  using  the  Stanford 
Validity  Checker  (SVC).  The  three  models  (IMPL,  ABS,  and  ISA)  and  the  proof  obli¬ 
gations  were  written  in  a  Lisp-like  HDL.  The  proof  formulas  were  constructed  by  sym¬ 
bolically  simulating  the  models  in  Lisp.  SVC  was  invoked  through  a  foreign-function 
interface  to  decide  the  validity  of  the  formulas.  SVC’s  built-in  support  for  linear  arith¬ 
metic  was  used  to  model  buffer  pointers  for  the  IMPL,  RB,  and  ABS  annotated  RB.  We 
also  extended  SVC  with  special  read  and  write  updates  to  support  the  writeback  to  the 
associative  memory  in  the  dispatch  buffer. 

The  total  CPU  run  times  and  number  of  case  splits  required  are  enumerated  in  Fig¬ 
ure  5,  The  number  of  case  splits  is  a  rough  indicator  of  the  relative  complexity  of  the 
simplified  formula. 
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Fig.  5.  (a)  SVC  run-times  and  number  of  case  splits  required  for  Proof  Obligation  1,  specified  for 
each  IMPL  transaction,  (b)  SVC  run-times  and  case  splits  for  the  verification  of  ABS.  All  runs 
performed  on  a  200-MHz  Intel  Pentium  Pro  system  running  Redhat  Linux. 


8  Discussion 

Our  work  addresses  two  of  the  major  problems  in  symbolic  verification  of  out-of-order 
processor  designs:  the  complexity  of  the  out-of-order  scheduling  logic  and  the  deep 
effective  length  of  the  pipeline.  While  our  IMPL  example  is  far  simpler  than  an  ac¬ 
tual  out-of-order  implementation,  it  is  representative  of  the  architectural  features  which 
make  out-of-order  verification  difficult  for  existing  techniques. 

There  is  still  much  work  to  be  accomplished  in  addressing  the  complexity  limita¬ 
tions  encountered  by  formal  methods  on  practical  industrial  designs.  As  these  problems 
are  solved,  we  expect  that  our  approach  will  be  directly  applicable.  We  also  anticipate 
that  the  incremental  flushing  approach  will  find  use  in  a  wide  variety  of  verification 
problems  involving  very  deep  pipelines,  such  as  digital-signal  processing. 

We  are  currently  formalizing  the  incremental  flushing  theory  in  the  PVS  theorem 
prover  [12].  For  each  new  design,  PVS  will  automatically  instantiate  the  proof  obliga¬ 
tions  and  pass  them  to  SVC  for  automatic  verification. 
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